Learning K-way D-dimensional Discrete Code For Compact Embedding Representations
نویسندگان
چکیده
Embedding methods such as word embedding have become pillars for many applications containing discrete structures. Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying linear transformation based on “one-hot” encoding of the discrete symbols. Despite its simplicity, such approach yields number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the “one-hot" encoding. In “KD encoding”, each symbol is represented by a D-dimensional code, and each of its dimension has a cardinality of K. The final symbol embedding vector can be generated by composing the code embedding vectors. To learn the semantically meaningful code, we derive a relaxed discrete optimization technique based on stochastic gradient descent. By adopting the new coding system, the efficiency of parameterization can be significantly improved (from linear to logarithmic), and this can also mitigate the over-fitting problem. In our experiments with language modeling, the number of embedding parameters can be reduced by 97% while achieving similar or better performance.
منابع مشابه
Max Margin Dimensionality Reduction
A fundamental problem in machine learning is to extract compact but relevant representations of empirical data. Relevance can be measured by the ability to make good decisions based on the representations, for example in terms of classification accuracy. Compact representations can lead to more human-interpretable models, as well as improve scalability. Furthermore, in multi-class and multi-tas...
متن کاملSupervised Manifold Learning for Media Interestingness Prediction
In this paper, we describe the models designed for automatically selecting multimedia data, e.g., image and video segments, which are considered to be interesting for a common viewer. Specifically, we utilize an existing dimensionality reduction method called Neighborhood MinMax Projections (NMMP) to extract the low-dimensional features for predicting the discrete interestingness labels. Meanwh...
متن کاملTaste Space Versus the World: an Embedding Analysis of Listening Habits and Geography
Probabilistic embedding methods provide a principled way of deriving new spatial representations of discrete objects from human interaction data. The resulting assignment of objects to positions in a continuous, low-dimensional space not only provides a compact and accurate predictive model, but also a compact and flexible representation for understanding the data. In this paper, we demonstrate...
متن کاملVariable Elimination in the Fourier Domain
The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach base...
متن کاملVariable Elimination in Fourier Domain
Probabilistic inference is a key computational challenge in statistical machine learning and artificial intelligence. The ability to represent complex high dimensional probability distributions in a compact form is the most important insight in the field of graphical models. In this paper, we explore a novel way to exploit compact representations of highdimensional probability distributions in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.03067 شماره
صفحات -
تاریخ انتشار 2017